Unreliable failure detectors for asynchronous distributed systems

نویسندگان

  • David Baelde
  • Franck Petit
  • Vincent Villain
چکیده

Distributed computing is very attractive, but comes with new problems : information losses, overflow, or breakdowns. Most often, they are neglected. Indeed, it has been shown that the Consensus (a fundamental problem which requires that the processes agree on a common value) is unsolvable in a realistic computing model, i.e. completely asynchronous with possible crash failures [FLP85]. Intuitively, in an asynchronous environment, a process cannot decide if a component is either crashed or very slow. Several approaches were designed to “bypass” that impossibility. One of them is self-stabilization, studied at LaRIA, which deals with transient faults. The principle is to design algorithms which can be executed from any initial state, and eventually work according to its specification. Snap-stabilization is stronger : from any initial step, the algorithm always behaves according to its specification. The first snap-stabilized algorithms were designed at LaRIA. Another approach, which we are going to study, cope with definitive (crash) failures. Ideally, a black box should be attached to each process to indicate precisely the failures of the network. This black box is called a failure detector. But, the result of [FLP85] implies that it is impossible to implement such a perfect failure detector. That is why Chandra and Toueg introduces in [CHT96] the notion of unreliable failure detectors. Even if such detectors are still impossible to implement, practically, this approach allows to implement semi-algorithms. Theoretically, this approach also allows to introduce a hierarchy of the unreliable

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

About the Relationship between Election Problem and Failure Detector in Asynchronous Distributed Systems

This paper is about the relationship between Election problem and Failure Detector in asynchronous distributed systems. We first discuss the relationship between the Election problem and the Consensus problem in asynchronous distributed systems with unreliable failure detectors. Chandra and Toueg have stated that Consensus is solvable in asynchronous systems with unreliable failure detectors. B...

متن کامل

On the Respective Power of *P and *S to Solve One-Shot Agreement Problems

Unreliable failure detectors are abstract devices that, when added to asynchronous distributed systems, allow to solve distributed computing problems (e.g. Consensus) that otherwise would be impossible to solve in these systems. This paper focuses on two classes of failure detectors defined by Chandra and Toueg, namely, the classes denoted 3P (eventually perfect) and 3S (eventually strong). Bot...

متن کامل

Fast Asynchronous Uniform Consensus in Real-Time Distributed Systems

We investigate whether asynchronous computational models and asynchronous algorithms can be considered for designing real-time distributed fault-tolerant systems. A priori, the lack of bounded finite delays is antagonistic with timeliness requirements. We show how to circumvent this apparent contradiction, via the principle of “late binding” of a solution to some (partially) synchronous model. ...

متن کامل

Unreliable Failure Detectors via Operational Semantics

The concept of unreliable failure detectors for reliable distributed systems was introduced by Chandra and Toueg as a fine-grained means to add weak forms of synchrony into asynchronous systems. Various kinds of such failure detectors have been identified as each being the weakest to solve some specific distributed programming problem. In this paper, we provide a fresh look at failure detectors...

متن کامل

Implementing unreliable failure detectors with unknown membership

Unreliable failure detectors [3] are useful devices to solve several fundamental problems in fault-tolerant distributed computing, like consensus or atomic broadcast. In their original work [3], Chandra and Toueg proposed 8 different classes of unreliable failure detectors, and showed that all of them can be used to solve consensus in a crash-prone asynchronous system with reliable links. All t...

متن کامل

Stubborn Communication Channels

This paper aims at bridging the gap between the assumption of reliable channels by fault-tolerant distributed algorithms and the weak reliability of feasible communication channels. We deene a new kind of communication channels which we call Stubborn channels. Stubborn channels are easily implementable over a connectionless network layer and, although weak, the reliability guarantees ooered by ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003